{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HVAC with Amazon SageMaker RL\n",
    "\n",
    "---\n",
    "## Introduction\n",
    "\n",
    "\n",
    "HVAC stands for Heating, Ventilation and Air Conditioning and is responsible for keeping us warm and comfortable indoors.  HVAC takes up a whopping 50% of the energy in a building and accounts for 40% of energy use in the US [1, 2]. Several control system optimizations have been proposed to reduce energy usage while ensuring thermal comfort.\n",
    "\n",
    "Modern buildings collect data about the weather, occupancy and equipment use. All of this can be used to optimize HVAC energy usage. Reinforcement Learning (RL) is a good fit because it can learn how to interact with the environment and identify strategies to limit wasted energy. Several recent research efforts have shown that RL can reduce HVAC energy consumption by 15-20% [3, 4].\n",
    "\n",
    "As training an RL algorithm in a real HVAC system can take time to converge as well as potentially lead to hazardous settings as the agent explores its state space, we turn to a simulator to train the agent. [EnergyPlus](https://energyplus.net/) is an open source, state of the art HVAC simulator from the US Department of Energy. We use a simple example with this simulator to showcase how we can train an RL model easily with Amazon SageMaker RL.\n",
    "\n",
    "1. Objective: Control the data center HVAC system to reduce energy consumption while ensuring the room temperature stays within specified limits.\n",
    "2. Environment: We have a small single room datacenter that the HVAC system is cooling to ensure the compute equipment works properly. We will train our RL agent to control this HVAC system for one day subject to weather conditions in San Francisco.  The agent takes actions every 5 minutes for a 24 hour period. Hence, the episode is a fixed 120 steps. \n",
    "3. State: The outdoor temperature, outdoor humidity and indoor room temperature.\n",
    "4. Action: The agent can set the heating and cooling setpoints. The cooling setpoint tells the HVAC system that it should start cooling the room if the room temperature goes above this setpoint. Likewise, the HVAC systems starts heating if the room temperature goes below the heating setpoint.\n",
    "5. Reward: The rewards has two components which are added together with coefficients: \n",
    "    1. It is proportional to the energy consumed by the HVAC system.\n",
    "    2. It gets a large penalty when the room temperature exceeds pre-specified lower or upper limits (as defined in `data_center_env.py`).\n",
    "\n",
    "References\n",
    "\n",
    "1. [sciencedirect.com](https://www.sciencedirect.com/science/article/pii/S0378778807001016)\n",
    "2. [environment.gov.au](https://www.environment.gov.au/system/files/energy/files/hvac-factsheet-energy-breakdown.pdf)\n",
    "3. Wei, Tianshu, Yanzhi Wang, and Qi Zhu. \"Deep reinforcement learning for building hvac control.\" In Proceedings of the 54th Annual Design Automation Conference 2017, p. 22. ACM, 2017.\n",
    "4. Zhang, Zhiang, and Khee Poh Lam. \"Practical implementation and evaluation of deep reinforcement learning control for a radiant heating system.\" In Proceedings of the 5th Conference on Systems for Built Environments, pp. 148-157. ACM, 2018."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-requisites \n",
    "\n",
    "### Imports\n",
    "\n",
    "To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "import sys\n",
    "import os\n",
    "import glob\n",
    "import re\n",
    "import subprocess\n",
    "import numpy as np\n",
    "from IPython.display import HTML\n",
    "import time\n",
    "from time import gmtime, strftime\n",
    "sys.path.append(\"common\")\n",
    "from misc import get_execution_role, wait_for_s3_object\n",
    "from docker_utils import build_and_push_docker_image\n",
    "from sagemaker.rl import RLEstimator, RLToolkit, RLFramework"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup S3 bucket\n",
    "\n",
    "Create a reference to the default S3 bucket that will be used for model outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sage_session = sagemaker.session.Session()\n",
    "s3_bucket = sage_session.default_bucket()  \n",
    "s3_output_path = 's3://{}/'.format(s3_bucket)\n",
    "print(\"S3 bucket path: {}\".format(s3_output_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Variables \n",
    "\n",
    "We define a job below that's used to identify our jobs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create unique job name \n",
    "job_name_prefix = 'rl-hvac'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure settings\n",
    "\n",
    "You can run your RL training jobs locally on the SageMaker notebook instance or on SageMaker training. In both of these scenarios, you can run in either 'local' (where you run the commands) or 'SageMaker' mode (on SageMaker training instances). 'local' mode uses the SageMaker Python SDK to run your code in Docker containers locally. It can speed up iterative testing and debugging while using the same familiar Python SDK interface. Just set `local_mode = True`. And when you're ready move to 'SageMaker' mode to scale things up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run local (on this machine)?\n",
    "# or on sagemaker training instances?\n",
    "local_mode = False\n",
    "\n",
    "if local_mode:\n",
    "    instance_type = 'local'\n",
    "else:\n",
    "    # choose a larger instance to avoid running out of memory\n",
    "    instance_type = \"ml.m4.4xlarge\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an IAM role\n",
    "\n",
    "Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    role = get_execution_role()\n",
    "\n",
    "print(\"Using IAM role arn: {}\".format(role))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install docker for `local` mode\n",
    "\n",
    "In order to work in `local` mode, you need to have docker installed. When running from your local machine, please make sure that you have docker or docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependencies.\n",
    "\n",
    "Note, you can only run a single local notebook at one time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Only run from SageMaker notebook instance\n",
    "if local_mode:\n",
    "    !/bin/bash ./common/setup.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build docker container\n",
    "\n",
    "Since we're working with a custom environment with custom dependencies, we create our own container for training. We:\n",
    "\n",
    "1. Fetch the base MXNet and Coach container image,\n",
    "2. Install EnergyPlus and its dependencies on top,\n",
    "3. Upload the new container image to AWS ECR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'\n",
    "repository_short_name = \"sagemaker-hvac-coach-%s\" % cpu_or_gpu\n",
    "docker_build_args = {\n",
    "    'CPU_OR_GPU': cpu_or_gpu, \n",
    "    'AWS_REGION': boto3.Session().region_name,\n",
    "}\n",
    "custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)\n",
    "print(\"Using ECR image %s\" % custom_image_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup the environment\n",
    "\n",
    "The environment is defined in a Python file called `data_center_env.py` and for SageMaker training jobs, the file will be uploaded inside the `/src` directory.\n",
    "\n",
    "The environment implements the init(), step() and reset() functions that describe how the environment behaves. This is consistent with Open AI Gym interfaces for defining an environment.\n",
    "\n",
    "1. `init()` - initialize the environment in a pre-defined state\n",
    "2. `step()` - take an action on the environment\n",
    "3. `reset()` - restart the environment on a new episode"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure the presets for RL algorithm \n",
    "\n",
    "The presets that configure the RL training jobs are defined in the “preset-energy-plus-clipped-ppo.py” file which is also uploaded as part of the `/src` directory. Using the preset file, you can define agent parameters to select the specific agent algorithm. You can also set the environment parameters, define the schedule and visualization parameters, and define the graph manager. The schedule presets will define the number of heat up steps, periodic evaluation steps, training steps between evaluations, etc.\n",
    "\n",
    "All of these can be overridden at run-time by specifying the `RLCOACH_PRESET` hyperparameter. Additionally, it can be used to define custom hyperparameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize src/preset-energy-plus-clipped-ppo.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write the Training Code \n",
    "\n",
    "The training code is written in the file “train-coach.py” which is uploaded in the /src directory. \n",
    "First import the environment files and the preset files, and then define the main() function. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize src/train-coach.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the RL model using the Python SDK Script mode\n",
    "\n",
    "If you are using local mode, the training will run on the notebook instance. When using SageMaker for training, you can select a GPU or CPU instance. The RLEstimator is used for training RL jobs. \n",
    "\n",
    "1. Specify the source directory where the environment, presets and training code is uploaded.\n",
    "2. Specify the entry point as the training code \n",
    "3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. \n",
    "4. Define the training parameters such as the instance count, job name, S3 path for output and job name. \n",
    "5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET can be used to specify the RL agent algorithm you want to use. \n",
    "6. [optional] Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator = RLEstimator(entry_point=\"train-coach.py\",\n",
    "                        source_dir='src',\n",
    "                        dependencies=[\"common/sagemaker_rl\"],\n",
    "                        image_name=custom_image_name,\n",
    "                        role=role,\n",
    "                        train_instance_type=instance_type,\n",
    "                        train_instance_count=1,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=job_name_prefix,\n",
    "                        hyperparameters = {\n",
    "                            'save_model': 1\n",
    "                        }\n",
    "                    )\n",
    "\n",
    "estimator.fit(wait=local_mode)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Store intermediate training output and model checkpoints \n",
    "\n",
    "The output from the training job above is stored on S3. The intermediate folder contains gifs and metadata of the training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job_name=estimator._current_job_name\n",
    "print(\"Job name: {}\".format(job_name))\n",
    "\n",
    "s3_url = \"s3://{}/{}\".format(s3_bucket,job_name)\n",
    "\n",
    "if local_mode:\n",
    "    output_tar_key = \"{}/output.tar.gz\".format(job_name)\n",
    "else:\n",
    "    output_tar_key = \"{}/output/output.tar.gz\".format(job_name)\n",
    "\n",
    "intermediate_folder_key = \"{}/output/intermediate/\".format(job_name)\n",
    "output_url = \"s3://{}/{}\".format(s3_bucket, output_tar_key)\n",
    "intermediate_url = \"s3://{}/{}\".format(s3_bucket, intermediate_folder_key)\n",
    "\n",
    "print(\"S3 job path: {}\".format(s3_url))\n",
    "print(\"Output.tar.gz location: {}\".format(output_url))\n",
    "print(\"Intermediate folder path: {}\".format(intermediate_url))\n",
    "    \n",
    "tmp_dir = \"/tmp/{}\".format(job_name)\n",
    "os.system(\"mkdir {}\".format(tmp_dir))\n",
    "print(\"Create local folder {}\".format(tmp_dir))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot metrics for training job\n",
    "We can pull the reward metric of the training and plot it to see the performance of the model over time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "\n",
    "csv_file_name = \"worker_0.simple_rl_graph.main_level.main_level.agent_0.csv\"\n",
    "key = os.path.join(intermediate_folder_key, csv_file_name)\n",
    "wait_for_s3_object(s3_bucket, key, tmp_dir)\n",
    "\n",
    "csv_file = \"{}/{}\".format(tmp_dir, csv_file_name)\n",
    "df = pd.read_csv(csv_file)\n",
    "df = df.dropna(subset=['Training Reward'])\n",
    "x_axis = 'Episode #'\n",
    "y_axis = 'Training Reward'\n",
    "\n",
    "plt = df.plot(x=x_axis,y=y_axis, figsize=(12,5), legend=True, style='b-')\n",
    "plt.set_ylabel(y_axis);\n",
    "plt.set_xlabel(x_axis);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation of RL models\n",
    "\n",
    "We use the last checkpointed model to run evaluation for the RL Agent. \n",
    "\n",
    "### Load checkpointed model\n",
    "\n",
    "Checkpointed data from the previously trained models will be passed on for evaluation / inference in the checkpoint channel. In local mode, we can simply use the local directory, whereas in the SageMaker mode, it needs to be moved to S3 first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "wait_for_s3_object(s3_bucket, output_tar_key, tmp_dir)  \n",
    "\n",
    "if not os.path.isfile(\"{}/output.tar.gz\".format(tmp_dir)):\n",
    "    raise FileNotFoundError(\"File output.tar.gz not found\")\n",
    "os.system(\"tar -xvzf {}/output.tar.gz -C {}\".format(tmp_dir, tmp_dir))\n",
    "\n",
    "if local_mode:\n",
    "    checkpoint_dir = \"{}/data/checkpoint\".format(tmp_dir)\n",
    "else:\n",
    "    checkpoint_dir = \"{}/checkpoint\".format(tmp_dir)\n",
    "\n",
    "print(\"Checkpoint directory {}\".format(checkpoint_dir))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if local_mode:\n",
    "    checkpoint_path = 'file://{}'.format(checkpoint_dir)\n",
    "    print(\"Local checkpoint file path: {}\".format(checkpoint_path))\n",
    "else:\n",
    "    checkpoint_path = \"s3://{}/{}/checkpoint/\".format(s3_bucket, job_name)\n",
    "    if not os.listdir(checkpoint_dir):\n",
    "        raise FileNotFoundError(\"Checkpoint files not found under the path\")\n",
    "    os.system(\"aws s3 cp --recursive {} {}\".format(checkpoint_dir, checkpoint_path))\n",
    "    print(\"S3 checkpoint file path: {}\".format(checkpoint_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run the evaluation step\n",
    "\n",
    "Use the checkpointed model to run the evaluation step. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator_eval = RLEstimator(entry_point=\"evaluate-coach.py\",\n",
    "                             source_dir='src',\n",
    "                             dependencies=[\"common/sagemaker_rl\"],\n",
    "                             image_name=custom_image_name,\n",
    "                             role=role,\n",
    "                             train_instance_type=instance_type,\n",
    "                             train_instance_count=1,\n",
    "                             output_path=s3_output_path,\n",
    "                             base_job_name=job_name_prefix,\n",
    "                             hyperparameters = {\n",
    "                                 \"RLCOACH_PRESET\": \"preset-energy-plus-clipped-ppo\",\n",
    "                                 \"evaluate_steps\": 288*2, #2 episodes, i.e. 2 days\n",
    "                             }\n",
    "                            )\n",
    "\n",
    "estimator_eval.fit({'checkpoint': checkpoint_path})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model deployment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since we specified a custom image when configuring the RLEstimator, we have to manually choose our deployment method. We'll choose to deploy with the MXNet Model since we used MXNet as the base container for our training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.mxnet import MXNetModel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = MXNetModel(model_data=estimator.model_data,\n",
    "                   role=role,\n",
    "                   entry_point='deploy-mxnet-coach.py',\n",
    "                   source_dir='src',\n",
    "                   dependencies=[\"common/sagemaker_rl\"],\n",
    "                   framework_version='1.3.0')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = model.deploy(initial_instance_count=1,\n",
    "                         instance_type=instance_type,\n",
    "                         endpoint_name=job_name_prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can test the endpoint with a samples observation, where the current room temperature is high. Since the environment vector was of the form `[outdoor_temperature, outdoor_humidity, indoor_humidity]` and we used observation normalization in our preset, we choose an observation of `[0, 0, 2]`. Since we're deploying a PPO model, our model returns both state value and actions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "action, action_mean, action_std = predictor.predict(np.array([0., 0., 2.,]))\n",
    "action_mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see heating and cooling setpoints are returned from the model, and these can be used to control the HVAC system for efficient energy usage. More training iterations will help improve the model further."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Clean up endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "conda_mxnet_p36",
   "language": "python",
   "name": "conda_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
